Wiki

Clone wiki

inf225 / 2017 / ex / Exercise 4

Exercise 4 (Compulsory)

You objective in this exercise is to work with the code we've built in the lectures, and turn it into a (reasonably) finished implementation of a simple language.

When you're done, you should be comfortable with:

  • Passing around environments for context information
  • Building basic evaluators and typecheckers
  • Program execution with and without a store
  • The distinction between abstract and concrete syntax

You may also realise the reasoning behind some design choices in languages you know (such as why variables from the surrounding scope must be final in Java lambdas and anonymous classes).

Implementation Choices

Feel free to implement the solution in a language of your choice. Building the full Simpl language, with parser and all can be a bit of work, so it's fine to start from abstract syntax (e.g., the output from the SimplTypeChecker from the lectures, or something you write by hand). Example ASTs are at the end (reading terms should be a lot easier than parsing concrete syntax).

It may be a good idea to coordinate with other students, so we can compare the different solutions and technologies afterwards.

Suggested implementation languages/toolkits:

  • Rascal
  • Haskell
  • Scheme / Racket
  • Java (you can represent your abstract syntax with classes and inheritance; there are also libraries that let you do pattern matching and tree traversal)
  • ANTLR+Java (lets you easily create a parser)

(These would normally take quite some time to learn, but you can build a full language implementation, including IDE support.)

Tasks

  • You are free to make your own decisions (including doing parts in another language if you wish), and even simplify the tasks if you need to.
  • Once you're done, you show and explain your solution to Anya or Anna who will let you know whether it's approved or if you need further improvements.
  • Deadline: you should get one us to review your solution some time in the week of Oct 30th–Nov 3rd.
  • We may/will expand the hints section as we go along, based on your questions. Please ask at the lectures, by email or via Facebook message. Feel free to discuss with other students as well.

This is what you have to do. Read below for helpful information about how to do it.

  1. Retrieve the code for the Simpl language implementation from the lectures, and check that you can run it.
  2. Write an evaluator for abstract syntax trees of our language (as defined in SimplAST.rsc). It should follow the basic style of the evaluator from SimplEvaluator.rsc (i.e., a set of recursive functions with an environment passed around), except it works on the AST datatype instead of concrete syntax.
  3. Do one or more language extensions:
    • Our language only supports single-argument functions (though with partial support for multiple arguments in the AST, to handle built-ins operators). Make it work also with arbitrary-length argument lists.
    • Overloaded functions are nice. Make it so that the type checker selects functions not just based on name, but also based on the types of the arguments. In the AST, you can distinguish the functions including the types in the name (e.g., using f:int(int,int) interally instead of f).
  4. Anonymous functions (lambdas) are all the rage these days – even Java and C++ have them now. Add this to the language, and update the evaluator and typechecker. It's already in the grammar.
  5. EXTRA: Expand the language with imperative constructs such as sequencing stat1; stat2;, {}-blocks and variable assignment x = y;. You can call your language Impl, for example :). You can handle variable assignment by having each evaluator step return an updated environment. You need to carefully thread the environment through statement sequences and constructs such as if, to make sure the environment is propagated correctly. You'll learn more about this in the next lectures.
  6. EXTRA: You should probably expand the language a little bit, with other useful constructs. A print statement, for example, if you added imperative constructs. You should also consider changing the syntax, so there is a start symbol Program that allows you to have a list of list of function definitions before the expression to be evaluated (instead of having to make lots of nested let/let funs). E.g., so your programs can look like this:
    fun int fib(int n) = ...
    fun int fab(int n) = ...
    
    fib(fab(13))
    

Detailed hints

1.

You find the code at this Git URL: https://bitbucket.org/sle-uib/inf225.git

To import in Eclipse, choose File -> Import -> Git -> Projects from Git, choose Clone URI and enter the URL. Click next a few times. It'll ask you about which project to import, and the project name is inf225.h17 (is probably preselected). Hit finish, and the code should show up in your project explorer.

For example, these things should work in the Console (provided you start the console in the inf225.h17 project):

rascal>import Simpl;
ok
rascal>import SimplTypeChecker;
ok
rascal>typecheck((Expr)`let x = 5 in x end`, ())
...
rascal>import SimplEvaluator;
ok
rascal>eval((Expr)`let x = 5 in x end`, ())
...

2. Evaluator

(The hints here are for Rascal, but you may of course use any implementation language.)

Getting Started
  • Make a new Rascal module (File -> New), name it SimplASTEvaluator or something.
  • You probably need the same imports as in SimplEvaluator.rsc, plus you need import SimplAST itself.
  • Your evaluator functions should look something like this:
    public Value eval(Plus(e1, e2), Env env) { ... }
    
  • Copy relevant definitions for Name, Value and Env from SimplEvaluator.rsc
  • For testing, you can write programs directly in abstract syntax. For example, Let("x", Int(5), Apply(Builtin("int::+"), [Var("x"), Var("x")])).
Hints

Note that you're now working in abstract syntax, rather than the concrete syntax in the evaluator we had in the lectures. This means we no longer care about program text; there's no parsing going on, and we don't need a grammar. (We assume that another component in our system has taken care of parsing and converting to an abstract syntax tree.)

  • Values and patterns look like this: Int(5), Int(a), Times(a, Int(b)) and so on, rather than (Expr)`5`, (Expr)`<NUM a>` and (Expr)`<Expr a>*<NUM b>`.
  • You don't need to worry about converting numbers from strings to ints, since they are already ints in the AST.
  • You can also make eval functions with a normal argument list, rather than using pattern matching. E.g., int eval(ExprTree t, Env env), and then use a switch statement to deal with the various cases of AST nodes.
  • To support other types than int, you must update the Value type. You can pattern match on Values just like we did on concrete syntax. For example, if you have two values v1 and v2, you can check if they are both ints: if(<Int(i1), Int(i2)> := <v1, v2>) { return Int(i1+i2); }

You can write any program directly in the abstract syntax; it's just a bit more cumbersome than when you have a nice concrete syntax (though Lispers and Scheme/Racket people seem to enjoy it). You can obtain example ASTs from the typechecker (see the end of this document).

  • If you want to try and make an imperative (store-oriented, rather than value-oriented) language, there are two approaches:
    • For very simple updatable variables, you can get by with each evaluation step returning an updated environment. This is what Rascal does internally, and allows variables to change (i.e., be rebound to different values) without values/objects to change.
    • In the more advanced case, you need to have a storage location allocated to each variable. Your store can be a list of values, and whenever you declare a new variable you must find a new storage location for it.

3. Multiple arguments and overloading

  • For multi-argument functions, you also need to update the syntax. Remember that you can use {Expr ","}* to indicate a comma-separated list of zero or more arguments. You use the same syntax in patterns for your type-checking functions, e.g., typecheck(`<ID f>(<{Expr ","}* args>`), env). The arguments will then be a list, which you can process with a for-loop or list comprehension ([f(x) | x <- args]).
  • For overloading, the environment will contain a list or set of functions rather than a single one. You must first determine the argument types, and then go through the list of candidates and pick the one where the argument types match.
    • (If you're doing type checking directly on abstract syntax instead of concrete syntax, assume that the type of variables are not yet determined. E.g., that variables are Var("x", Unknown()) and you're supposed to resolve them to Var("x", Int()).)

4. Anonymous functions

Note that anonymous functions are essentially equivalent to making a regular function and then returning it. There's already a desugar function that does this translation, but you should make the evaluator and typechecker process lambdas natively.

For fun, once you have lambdas, you can remove the implementation of let, and instead desugar it to a lambda (the "body" or "scope" of the let) and an immediate call to it (with the initialiser expression) – functions and name binding are very closely related.

5. Imperative features

Getting Started
  • Read the stack frame chapter hand-out.
  • The point is that each variable should have a store location on the current function's stack frame. The stack frame has space for all local variables in a function; when a function is called (which would happen in e.g. an evaluator), space on the stack is made available for the stack frame, and local variables are accessed relative to a frame pointer (which points to the stack frame in memory).
  • Regarding store locations, there are two approaches: for a language with only local variables, you can keep track of the currently allocated store locations in the environment (e.g., just make a fake variable with a special name such as "__store_top", for instance). With global variables (for instance, storing a list of top-level functions), you can use a global counter.
Hints
  • If you have functions nested within function (which the language syntax allows), and the inner function can access the outer function's variables, this exercise becomes a lot harder. You don't have to solve this issue, but at least try to think about it – could this be one of the reasons why Java requires that local variables that are used in an inner class are marked as final?
  • What happens if you have allocated space for an object on the stack, and the function returns a reference to it?
  • A function bundled with its environment and variable storage (if any) is known as a closure. Languages such as Scheme allow functions to update non-local variables (e.g., variables from an outer scope). This makes the simple stack scheme we've used here unusable, since the life time of a variable will be dynamic rather than determined simply by the nesting of function calls. This is a useful feature – you can use it, e.g, to implement object orientation, but it can be tricky to implement. This is an example that makes use of closures with updatable variables:
    let int makeCounter() = // makes a function that gives a sequentially increasing number
       let counter = 0 in // a local counter variable
          let int f() = {
                counter = counter + 1; 
            return counter;
          } in
             f  // return the inner function
          end
       end
    end
    

AST Examples

Obtain the AST this way:

rascal>import ParseTree;
ok
rascal>import IO;
ok
rascal>import Simpl;
ok
rascal>import SimplTypeChecker;
ok
rascal>println("<typecheck(parse(#start[Expr], "42").top, ())>")
<Int(42),Int()>
ok
rascal>println("<typecheck(parse(#start[Expr], "2+2").top, ())>")
<Apply(Builtin("int::+"),[Int(2),Int(2)]),Int()>
ok
rascal>typecheck(parse(#start[Expr], |project://inf225.h17/src/fib.simpl|).top, ())
tuple[ExprAST,Type]: <LetFun( ...

fib.simpl

LetFun("fib",[Param(Int(),"n")],Let("bib",Var("fib"),
If(Apply(Builtin("int::\<"),[Var("n"),Int(2)]),Var("n"),Apply(Builtin("int::+"),[Apply(Var("fib"),
[Apply(Builtin("int::-"),[Var("n"),Int(1)])]),Apply(Var("bib"),[Apply(Builtin("int::-"),
[Var("n"),Int(2)])])]))),Apply(Var("fib"),[Var("X")]))

example1.simpl

Let("a",Int(15),Apply(Builtin("int::+"),[Let("a",Int(10),If(Int(0),Apply(Builtin("int::+"),[Apply(Builtin("int::+"),[Int(1),Int(2)]),Apply(Builtin("int::*"),[Int(3),Int(6)])]),Apply(Builtin("int::+"),[Var("a"),Var("a")]))),Var("a")]))

example2.simpl

Let("TWO",Int(2),LetFun("double",[Param(Int(),"x")],Apply(Builtin("int::*"),[Var("x"),Var("TWO")]),Let("TWO",Int(3),Apply(Var("double"),[Int(7)]))))

Updated